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Abstract 

We present a new model for LT codes which simplifies the analysis of the error probability of 
decoding by belief propagation. For any given degree distribution, we provide the first rigorous 
expression for the limiting error probability as the length of the code goes to infinity via recent 
results in random hypergraphs |2]. For a code of finite length, we provide an algorithm for 
computing the probability of error of the decoder. This algorithm improves the one of Karp, 
Luby, and Shokrollahi 0 by a linear factor. 


1 Introduction 

Fountain codes were originally introduced in Jj and were designed for robust and scalable transmis¬ 
sion of data over lossy networks. Given a vector of input symbols (x\, X 2 , ■ ■ ■, x &), a fountain code 
generates a stream of output symbols to be sent over the network. Each output symbol is generated 
independently by sampling from a fixed distribution on subsets of the input symbols and adding 
the symbols in the chosen subset. The sequence of output symbols, together with the positions 
of the input symbols whose sum they represent, is sent over a lossy network. The input word is 
decoded using the belief propagation algorithm which takes only linear time. The probability that 
the belief propagation decoder fails depends on the distribution from which output symbols were 
generated and on the number n of output symbols received. 

Analysis of the error probability to date has been carried out under the assumption of a fixed 
number of received output symbols n. Here we will change this assumption and say that the 
number of output symbols received is a random variable with mean n. This assumption makes 
sense in applications and is not significantly different from the case of fixed number of output 
symbols, because the random variable is highly concentrated around n. We will define the exact 
distribution in the following section. We refer to this as the Poisson model because the number of 
output symbols approaches the Poisson distribution as k goes to infinity. Intuitively, the Poisson 
model adds further independence between the random variables involved in the error-probability 
calculations, and thus significantly simplifies the analysis. 

We will apply the new model to the analysis of a particular kind of fountain codes - the LT 
codes introduced by Michael Luby in 0. The output symbols in LT codes are generated in the 
following way: d is chosen from a fixed probability distribution fi = (Hi, f^,... fi*,) on the set 
1,2,... ,k, after which the parity of d random input symbols is computed. 

We are interested in two questions. Firstly, we look for an analytic expression for the limiting 
error-probability of belief propagation. The second question is that of designing an algorithm to 
compute the error-probability for finite-length codes. 
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Figure 1: Round nodes denote input symbols and square nodes denote output symbols. Every 
subset of input symbol corresponds to a potential output symbol, taking as value their sum mod 
2. Each output symbol is generated independently with probability that depends on the size of the 
set. The solid square nodes correspond to output symbols that were generated and received. 

The asymptotic analysis of LT codes to date has been based on a heuristic calculation, using 
the fact that in the limit of k going to infinity the belief propagation iterations behave as if on a 
tree graph. With the new model we can apply recent results in the analysis of processes on random 
hypergraphs 0 to give an exact expression for the portion of symbols that can be decoded by belief 
propagation as k goes to infinity. 

For the finite-length analysis of LT codes, Karp, Luby, and Shokrollahi J5J proposed a dynamic 
programming algorithm. The size of the table is 0(n 3 ) and each entry is computed using 0{n 2 ) of 
the previous entries. Using generating polynomials representation and fast multi-point evaluation 
and interpolation of polynomials, the complexity of the algorithm is 0(n 3 log 2 n). The Poisson 
model permits us to reduce the dynamic programming recursion to a table of size 0(k 2 ), and each 
entry is computed from 0(k ) of the previous ones. Using generating polynomials representation, 
the complexity is reduced to 0(k 2 log A;). 

In the next section we will review the factor graph representation of LT codes, the belief prop¬ 
agation algorithm for them, and we will define the new model precisely. Section |31 is dedicated to 
the asymptotic analysis, and Section El to the finite length analysis. We will conclude with a brief 
discussion of open problems. 

2 Background and Definitions 

It is convenient to think of the set of input and output symbols as the vertices of a bipartite 
graph. Every output symbols is connected by an edge to all input symbols in the set whose sum it 
represents as in figure EH] 

2.1 Belief Propagation 

In the setting of fountain codes the belief propagation algorithm is very simple. If there is an output 
symbol with a single undecoded neighbor, then the value of that input symbol can be computed. 
In this case, we say that a decodable input symbol becomes uncovered or decoded. Uncovering one 
symbol may result in other input symbols becoming decodable, and so on. The process stops when 
there are no decodable input symbols, or equivalently, there are no output symbols with a single 
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undecoded neighbor. We refer to the set of decodable input symbols as the ripple (note that in 5] 
the ripple is, instead, the set of output symbols that have only one undecoded symbol). At every 
step, one input symbol leaves the ripple and 0 or more input symbols join the ripple. 

2.2 The Poisson model for LT codes 

For a given set of input symbols of size d, let pd = nfld/(d) be the probability that an output symbol 
representing the parity of this set was received. Then by linearity of expectation the expected 
number of distinct output symbols is exactly n and the expected number of output symbols from 
sets of size d is nfld- 

Let D be the largest degree with positive probability. Let Nd for d = 1,..., D be a random 
variable denoting the total number of output symbols of degree d, and N = Yld= l is the 
total number of output symbols. The distribution of Nd is binomial £>((,), Pd)- By concentration 
inequalities in [1J: 

Pr[JV>n + A] < (D 

( a 2 \ 

Pr[iV < n — A] < exp [ — j. (2) 

3 Asymptotic Analysis of LT Codes 

The above random model is almost identical to the Poisson random hypergraph model of Darling 
and Norris [2J. The process that they study, called the hypergraph collapse process, is identical 
to the uncovering of input symbols in the belief propagation algorithm. In order to restate their 
result in our setting, we need some notation. Let n = (1 + S)k for some constant 5 > 0. Let 
(3\ = — ln(l — (1 + <5)Oi) and (3d = (1 + 5) fid, for d = 2,... , k. From these we define the power 
series 

(3{t) = 

d> 0 

and its derivative: 

(3'{t) = ^ d(3 d t d ~ l . 

d> 1 

The statement of the theorem is in terms of the roots of the function (3'(t) + log(l — t). Let 

z* = inf{t G [0,1) : /3'(t) + log(l — t) < 0} A 1 

and suppose there are no roots of f3'(t) + log(l — t) in [0,2*). Notice that in particular if (3 is a 
polynomial (as is the case in the LT-codes setting) then z* < 1. 

Theorem 1 [J2J Assuming z* < 1 and there are no roots of (3'{t ) + log(l — t) in [0, z*), then as k 
goes to infinity the fraction of recoverable input symbols goes to 2 * in probability. 

Therefore as a first test for the quality of a particular degree distribution, one can compute the 
roots of (3'{t) + log(l — t ). In fact, (1 — t)((3'(t) + log(l — t)) is the expected fraction of output 
symbols which have a unique undecoded neighbor, when fraction t of the input symbols have been 
decoded. This is equivalent to the expression obtained from the tree analysis. 
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4 Finite-length Analysis of LT Codes 


For codes of finite length, we are interested in calculating the probability that all input symbols 
can be recovered. In this section we give an algorithm for computing this probability for a given 
degree distribution. 

4.1 Recursion of probability distributions 

Let X u , for u = 1,... , k, be random variables that denote the size of the ripple when u symbols 
are undecoded, or equivalently, we will sometimes say at step u. In particular, is the number 
of input symbols for which a degree-1 output symbol was generated. This number has a binomial 
distribution B(k,p±). The decoding process stops when X u = 0. The distribution of the size of the 
ripple at step u — 1 depends only on the size of the ripple at step u. If X u = 0 then the process 
stops and X u _\ = 0. If X u > 0, then one of the symbols in the ripple is decoded. This results in 
Y u input symbols joining the ripple. Y u is distributed as the binomial distribution B(u — X u ,q u ), 
where q u is the probability that a symbol joins the ripple at step u (when the (k — u + l)-st symbol 
is decoded). An input symbol a joins the ripple at this time if and only if there is an output symbol 
with neighbors: the symbol a, the last decoded symbol, and any set of symbols among the other 
k — u decoded symbols. Therefore, 


min{D,fc— u+2} 

d= 2 

Finally, A u _i = X u — 1 + Y u . Therefore, for every 1 < r < u — 1 and 1 < s < r + 1, 

Pr[*„_! = r \ X u = s] = Pr[Y u = r - s + 1] = ( U ~ ^ ') q r ~ s+l (1 - q u ) u ~ r -\ 

\r — s + 1/ 

This gives us an expression for the distribution of X u _\ in terms of the distribution of X u : 

r+l 

Pr[X u _ 1= r] = ^Pr[X u = S ]Pr[X u _! =r\X u = s\. 

S= 1 

The probability that belief propagation cannot complete the decoding is exactly the probability 
that X\ = 0. This can be computed by dynamic programming. Let Q(u,r ) be Pr[A ?i = r], for 
every u = 0,..,, k, and r = 0,..., u. Then 

Q(k, r) = Pi (1 - Pi) k ~ r , for r = 0,..., k 

T+l / \ 

Q{u - 1, r) = s ) L U _ s S + 1 J <fu~ S+1 (1 - QuV^- 1 , for r = 1,1 

S = 1 ' ' 

Q(u- 1,0) = Q(u, 0) + Q(u, 1) (1 - q u ) u ~ l - 
Finally, the probability of error of the decoder is Q{ 1, 0). 
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4.2 Complexity of the algorithm 

To compute the values for (l — q u ) for u = 1 ,..., k. we proceed again by dynamic programming. We 

/ k — u\ 

store the values for all of the factors f(u,d ) = (1 —pdP d ~ 2 ' for every d = 2, 3,, min{D, k — u + 2}. 

k-u- 1-1 

There are 0(Dk ) entries and f(u — 1 ,d) = f(u, d) k-u-d+3 ; which takes 0(log k) operations to 
compute. Therefore precomputing q u takes O(Dklogk) operations. 

The recursion for Q(u,r ) can be computed more efficiently if we represent it by generating 
polynomials. We will proceed in a manner similar to 0. Let Q u {x) = Y,r=iQ(. u ’ r ) xr 1 - Then the 
recursion can be written as: 


Qk(x) = ~{{p\x + (1 -pi)) k - (1 -Pi) fc ), 


Qu-l(z) = fa* + I 1 -*■»"" Q, 


q u x + (1 - q u ) 


(1 - 9u) 


u —1 


Qu(o). 


Finally, the probability of success is Qi(x) (which is a constant). We compute the sequence of poly¬ 
nomials Q u (x) for u = k ,..., 1, in the following way: Suppose we have computed the coefficients of 
Q u . We choose k non-zero points X\,X 2 , ■ ■ ■ Xk, and compute y* = - y for i = 1,..., k. We 

evaluate Q u at the points y* using the multipoint evaluation algorithm, which takes 0(log k) oper¬ 
ations per point. Given these values, evaluating Q u -i(xi) takes another O(logfc) operations. We 
can then interpolate the coefficients of Q u by fast polynomial interpolation, which takes 0{k\ogk) 
operations. Therefore, the time complexity of our algorithm is 0[k 2 \ogk). 


4.3 Implications for the case with fixed number of output symbols 

The algorithm above outputs the probability that belief propagation fails, given that the number 
of output symbols is a random variable with expected value n, as described. Let’s denote this 
probability by P v (n). We can use this algorithm to get bounds for the probability that belief 
propagation fails, when there is a fixed number of output symbols, which we denote by Pf(n). We 
use the fact that the probability that the decoder fails is monotone decreasing in the number of 
output symbols, and 


2 k -l 

P p {n) = ^ Pr[iV = n] x Pf{n). 
h =o 


Let ni < n < ri 2 - Using the concentration inequalities 0 and © we get the bounds: 


Pp( n i) - exp( 


3(n — ni) 2 \ 
2(2ni + n)) / 


< Pf(n) < 


-P p (n 2 ) 


1 — exp 


( H2-n ) 2 \ 

2n2 ) 


5 Discussion 

Our approach presented here is applicable to general fountain codes, as well as some classes of 
LDPC codes. Di et al. 3] gave algorithms for the finite-length analysis of regular LDPC codes 
(i.e. left and right degrees are constant). Our method is applicable to codes with Poissonian degree 
distribution on the right. 
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